Search CORE

78 research outputs found

Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost

Author: Saha Barna
Subramanian Sanjay
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

Several clustering frameworks with interactive (semi-supervised) queries have been studied in the past. Recently, clustering with same-cluster queries has become popular. An algorithm in this setting has access to an oracle with full knowledge of an optimal clustering, and the algorithm can ask the oracle queries of the form, "Does the optimal clustering put vertices u and v in the same cluster?" Due to its simplicity, this querying model can easily be implemented in real crowd-sourcing platforms and has attracted a lot of recent work. In this paper, we study the popular correlation clustering problem (Bansal et al., 2002) under the same-cluster querying framework. Given a complete graph G=(V,E) with positive and negative edge labels, correlation clustering objective aims to compute a graph clustering that minimizes the total number of disagreements, that is the negative intra-cluster edges and positive inter-cluster edges. In a recent work, Ailon et al. (2018b) provided an approximation algorithm for correlation clustering that approximates the correlation clustering objective within (1+epsilon) with O((k^{14} log{n} log{k})/epsilon^6) queries when the number of clusters, k, is fixed. For many applications, k is not fixed and can grow with |V|. Moreover, the dependency of k^14 on query complexity renders the algorithm impractical even for datasets with small values of k. In this paper, we take a different approach. Let C_{OPT} be the number of disagreements made by the optimal clustering. We present algorithms for correlation clustering whose error and query bounds are parameterized by C_{OPT} rather than by the number of clusters. Indeed, a good clustering must have small C_{OPT}. Specifically, we present an efficient algorithm that recovers an exact optimal clustering using at most 2C_{OPT} queries and an efficient algorithm that outputs a 2-approximation using at most C_{OPT} queries. In addition, we show under a plausible complexity assumption, there does not exist any polynomial time algorithm that has an approximation ratio better than 1+alpha for an absolute constant alpha > 0 with o(C_{OPT}) queries. Therefore, our first algorithm achieves the optimal query bound within a factor of 2. We extensively evaluate our methods on several synthetic and real-world datasets using real crowd-sourced oracles. Moreover, we compare our approach against known correlation clustering algorithms that do not perform querying. In all cases, our algorithms exhibit superior performance

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Renting a Cloud

Author: Saha Barna
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2013)
Publication date: 01/01/2013
Field of study

We consider the problem of efficiently scheduling jobs on data centers to minimize the cost of renting machines from "the cloud." In the most basic cloud service model, cloud providers offer computers on demand from large pools installed in data centers. Clients pay for use at an hourly rate. In order to minimize cost, each client needs to decide on the number of machines to be rented and the duration of renting each machine. This suggests the following optimization problem, which we call Rent Minimization. There is a set J={j_1,j_2,...,j_n} of n jobs. Job j_i is released at time r_i >= 0, has a deadline of d_i, and requires p_i>0 contiguous processing time, r_i,d_i,p_i in R. The jobs need to be scheduled on identical parallel machines. Machines may be rented for any length of time; however, the cost of renting a machine for l>=0 time units is [l/D] (the smallest integer >= l/D) dollars, for some given large real D; in particular, one pays dollar 2 whether the machine is rented for D+1 or 2D time units. The goal is to schedule all the jobs in a way that minimizes the incurred rental cost. In this paper, we develop offline and online algorithms for Rent Minimization problem. The algorithms achieve a constant factor approximation for the offline version and O(log(p_max/p_min)) for the online version, where p_max and p_min are the maximum and minimum processing time of the jobs respectively. We also show that no deterministic online algorithm can achieve an approximation factor better than log_{3}(p_max/p_min) within a constant factor. Both of these algorithms use the well-studied problem of Machine Minimization as a subroutine. Machine Minimization is a special case of Rent Minimization where D = max_{i}{d_i}. In the process of solving the Rent Minimization problem, in this paper, we also develop the first online algorithm for Machine Minimization

Dagstuhl Research Online Publication Server

Energy Efficient Scheduling via Partial Shutdown

Author: Kuller Samir
Li Jian
Saha Barna
Publication venue
Publication date: 07/12/2009
Field of study

Motivated by issues of saving energy in data centers we define a collection of new problems referred to as "machine activation" problems. The central framework we introduce considers a collection of

m

machines (unrelated or related) with each machine

i

having an {\em activation cost} of

a_i

. There is also a collection of

n

jobs that need to be performed, and

p_{i,j}

is the processing time of job

j

on machine

i

. We assume that there is an activation cost budget of

A

-- we would like to {\em select} a subset

S

of the machines to activate with total cost

a(S) \le A

and {\em find} a schedule for the

n

jobs on the machines in

S

minimizing the makespan (or any other metric). For the general unrelated machine activation problem, our main results are that if there is a schedule with makespan

T

and activation cost

A

then we can obtain a schedule with makespan \makespanconstant T and activation cost \costconstant A, for any

\epsilon >0

. We also consider assignment costs for jobs as in the generalized assignment problem, and using our framework, provide algorithms that minimize the machine activation and the assignment cost simultaneously. In addition, we present a greedy algorithm which only works for the basic version and yields a makespan of

2T

and an activation cost

A (1+\ln n)

. For the uniformly related parallel machine scheduling problem, we develop a polynomial time approximation scheme that outputs a schedule with the property that the activation cost of the subset of machines is at most

A

and the makespan is at most

(1+\epsilon) T

for any

\epsilon >0

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server